AITopics | last-iterate convergence

Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime

Neural Information Processing SystemsJun-20-2026, 03:56:42 GMT

We study population convergence guarantees of stochastic gradient descent (SGD) for smooth convex objectives in the interpolation regime, where the noise at optimum is zero or near zero. The behavior of the last iterate of SGD in this setting--particularly with large (constant) stepsizes--has received growing attention in recent years due to implications for the training of over-parameterized models, as well as to analyzing forgetting in continual learning and to understanding the convergence of the randomized Kaczmarz method for solving linear systems.

artificial intelligence, convergence, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Efficient Last-Iterate Convergence in Solving Extensive-Form Games

Neural Information Processing SystemsJun-19-2026, 19:46:26 GMT

To establish last-iterate convergence for Counterfactual Regret Minimization (CFR) algorithms in learning a Nash equilibrium (NE) of extensive-form games (EFGs), recent studies reformulate learning an NE of the original EFG as learning the NEs of a sequence of (perturbed) regularized EFGs. Hence, proving last-iterate convergence in solving the original EFG reduces to proving last-iterate convergence in solving (perturbed) regularized EFGs. However, these studies only establish last-iterate convergence for Online Mirror Descent (OMD)-based CFR algorithms instead of Regret Matching (RM)-based CFR algorithms in solving perturbed regularized EFGs, resulting in a poor empirical convergence rate, as RM-based CFR algorithms typically outperform OMD-based CFR algorithms. In addition, as solving multiple perturbed regularized EFGs is required, fine-tuning across multiple perturbed regularized EFGs is infeasible, making parameter-free algorithms highly desirable. This paper show that CFR+, a classical parameter-free RM-based CFR algorithm, achieves last-iterate convergence in learning an NE of perturbed regularized EFGs. This is the first parameter-free last-iterate convergence for RM-based CFR algorithms in perturbed regularized EFGs. Leveraging CFR+ to solve perturbed regularized EFGs, we get Reward Transformation CFR+ (RTCFR+). Importantly, we extend prior work on the parameter-free property of CFR+, enhancing its stability, which is vital for the empirical convergence of RTCFR+. Experiments show that RTCFR+ exhibits a significantly faster empirical convergence rate than existing algorithms that achieve theoretical last-iterate convergence.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Asia > China (0.28)
North America > Canada (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Information Technology (0.92)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Add feedback

Last-Iterate Convergence of Smooth Regret Matching + Variants in Learning Nash Equilibria

Neural Information Processing SystemsJun-15-2026, 09:33:10 GMT

Regret Matching+ (RM+) variants are widely used to build superhuman Poker AIs, yet few studies investigate their last-iterate convergence in learning a Nash equilibrium (NE). Although their last-iterate convergence is established for games satisfying the Minty Variational Inequality (MVI), no studies have demonstrated that these algorithms achieve such convergence in the broader class of games satisfying the weak MVI. A key challenge in proving last-iterate convergence for RM+ variants in games satisfying the weak MVI is that even if the game's loss gradient satisfies the weak MVI, RM+ variants operate on a transformed loss feedback which does not satisfy the weak MVI. To provide last-iterate convergence for RM+ variants, we introduce a concise yet novel proof paradigm that involves: (i) transforming an RM+ variant into an Online Mirror Descent (OMD) instance that updates within the original strategy space of the game to recover the weak MVI, and (ii) showing last-iterate convergence by proving the distance between accumulated regrets converges to zero via the recovered weak MVI of the feedback. Inspired by our proof paradigm, we propose Smooth Optimistic Gradient Based RM+ (SOGRM+) and show that it achieves last-iterate and finite-time best-iterate convergence in learning an NE of games satisfying the weak MVI, the weakest condition among all known RM+ variants. Experiments show that SOGRM+ significantly outperforms other algorithms. Our code is available at https://github.

artificial intelligence, convergence, machine learning, (14 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment > Games (0.68)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Add feedback

No-regret learning in games with noisy feedback: Faster rates and adaptivity via learning rate separation

Neural Information Processing SystemsApr-25-2026, 06:01:24 GMT

We examine the problem of regret minimization when the learner is involved in a continuous game with other optimizing agents: in this case, if all players follow a no-regret algorithm, it is possible to achieve significantly lower regret relative to fully adversarial environments. We study this problem in the context of variationally stable games (a class of continuous games which includes all convexconcave and monotone games), and when the players only have access to noisy estimates of their individual payoff gradients. If the noise is additive, the gametheoretic and purely adversarial settings enjoy similar regret guarantees; however, if the noise is multiplicative, we show that the learners can, in fact, achieve constant regret. We achieve this faster rate via an optimistic gradient scheme with learning rate separation - that is, the method's extrapolation and update steps are tuned to different schedules, depending on the noise profile. Subsequently, to eliminate the need for delicate hyperparameter tuning, we propose a fully adaptive method that attains nearly the same guarantees as its non-adapted counterpart, while operating without knowledge of either the game or of the noise profile.

artificial intelligence, inequality, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.45)
Europe (0.28)

Industry: Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

eb1848290d5a7de9c9ccabc67fefa211-Paper.pdf

Neural Information Processing SystemsMar-14-2026, 07:00:29 GMT

algorithm, convergence, matrix game, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Game Theory (0.95)

Add feedback

Efficient Uncoupled Learning Dynamics with $\tilde{O}\!\left(T^{-1/4}\right)$ Last-Iterate Convergence in Bilinear Saddle-Point Problems over Convex Sets under Bandit Feedback

Maiti, Arnab, Zhang, Claire Jie, Jamieson, Kevin, Morgenstern, Jamie Heather, Panageas, Ioannis, Ratliff, Lillian J.

arXiv.org Machine LearningFeb-26-2026

In this paper, we study last-iterate convergence of learning algorithms in bilinear saddle-point problems, a preferable notion of convergence that captures the day-to-day behavior of learning dynamics. We focus on the challenging setting where players select actions from compact convex sets and receive only bandit feedback. Our main contribution is the design of an uncoupled learning algorithm that guarantees last-iterate convergence to the Nash equilibrium with high probability. We establish a convergence rate of $\tilde{O}(T^{-1/4})$ up to polynomial factors in problem parameters. Crucially, our proposed algorithm is computationally efficient, requiring only an efficient linear optimization oracle over the players' compact action sets. The algorithm is obtained by combining techniques from experimental design and the classic Follow-The-Regularized-Leader (FTRL) framework, with a carefully chosen regularizer function tailored to the geometry of the action set of each learner.

artificial intelligence, convergence, machine learning, (12 more...)

arXiv.org Machine Learning

2602.21436

Country: